National generation and fuel consumption

The data in this notebook is generation and consumption by fuel type for the entire US. These values are larger than what would be calculated by summing facility-level data. Note that the fuel types are somewhat aggregated (coal rather than BIT, SUB, LIG, etc). So when we multiply the fuel consumption by an emissions factor there will be some level of error.

The code assumes that you have already downloaded an ELEC.txt file from EIA's bulk download website.


In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import io, time, json
import pandas as pd
import os
import numpy as np
import math

Read ELECT.txt file


In [3]:
path = os.path.join('Raw data', 'Electricity data', '2017-03-15 ELEC.txt')
with open(path, 'rb') as f:
    raw_txt = f.readlines()

Filter lines to only include total generation and fuel use

Only want monthly US data for all sectors

  • US-99.M
  • ELEC.GEN, ELEC.CONS_TOT_BTU, ELEC.CONS_EG_BTU
  • not ALL

Fuel codes:

  • WWW, wood and wood derived fuels
  • WND, wind
  • STH, solar thermal
  • WAS, other biomass
  • TSN, all solar
  • SUN, utility-scale solar
  • NUC, nuclear
  • NG, natural gas
  • PEL, petroleum liquids
  • SPV, utility-scale solar photovoltaic
  • PC, petroluem coke
  • OTH, other
  • COW, coal,
  • DPV, distributed photovoltaic
  • OOG, other gases
  • HPS, hydro pumped storage
  • HYC, conventional hydroelectric
  • GEO, geothermal
  • AOR, other renewables (total)

In [4]:
def line_to_df(line):
    """
    Takes in a line (dictionary), returns a dataframe
    """
    for key in ['latlon', 'source', 'copyright', 'description', 
                'geoset_id', 'iso3166', 'name', 'state']:
        line.pop(key, None)

    # Split the series_id up to extract information
    # Example: ELEC.PLANT.GEN.388-WAT-ALL.M
    series_id = line['series_id']
    series_id_list = series_id.split('.')
    # Use the second to last item in list rather than third
    plant_fuel_mover = series_id_list[-2].split('-')
    line['type'] = plant_fuel_mover[0]
#     line['state'] = plant_fuel_mover[1]
    line['sector'] = plant_fuel_mover[2]
    temp_df = pd.DataFrame(line)

    try:
        temp_df['year'] = temp_df.apply(lambda x: x['data'][0][:4], axis=1).astype(int)
        temp_df['month'] = temp_df.apply(lambda x: x['data'][0][-2:], axis=1).astype(int)
        temp_df['value'] = temp_df.apply(lambda x: x['data'][1], axis=1)
        temp_df.drop('data', axis=1, inplace=True)
        return temp_df
    except:
        exception_list.append(line)
        pass

In [5]:
exception_list = []
gen_rows = [row for row in raw_txt if 'ELEC.GEN' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
total_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_TOT_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]
eg_fuel_rows = [row for row in raw_txt if 'ELEC.CONS_EG_BTU' in row and 'series_id' in row and 'US-99.M' in row and 'ALL' not in row]

All generation by fuel


In [6]:
gen_df = pd.concat([line_to_df(json.loads(row)) for row in gen_rows])

In [7]:
#drop
gen_df.head()


Out[7]:
end f geography last_updated sector series_id start type units year month value
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR thousand megawatthours 2016 12 32426.66031
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR thousand megawatthours 2016 11 28579.08298
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR thousand megawatthours 2016 10 29919.29294
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR thousand megawatthours 2016 9 26695.98579
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR thousand megawatthours 2016 8 24509.79210

Multiply generation values by 1000 and change the units to MWh


In [8]:
gen_df.loc[:,'value'] *= 1000
gen_df.loc[:,'units'] = 'megawatthours'

In [9]:
gen_df['datetime'] = pd.to_datetime(gen_df['year'].astype(str) + '-' + gen_df['month'].astype(str), format='%Y-%m')
gen_df['quarter'] = gen_df['datetime'].dt.quarter
gen_df.rename_axis({'value':'generation (MWh)'}, axis=1, inplace=True)

In [10]:
#drop
gen_df.head()


Out[10]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 12 32426660.31 2016-12-01 4
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 11 28579082.98 2016-11-01 4
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 10 29919292.94 2016-10-01 4
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 9 26695985.79 2016-09-01 3
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 8 24509792.10 2016-08-01 3

In [11]:
#drop
gen_df.loc[gen_df['type']=='OOG'].head()


Out[11]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 12 1007405.42 2016-12-01 4
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 11 1001151.99 2016-11-01 4
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 10 891165.18 2016-10-01 4
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 9 1050049.21 2016-09-01 3
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 8 1101829.28 2016-08-01 3

Total fuel consumption


In [12]:
total_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in total_fuel_rows])

In [13]:
#drop
total_fuel_df.head()


Out[13]:
end f geography last_updated sector series_id start type units year month value
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW million MMBtu 2016 12 1260.87651
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW million MMBtu 2016 11 934.68259
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW million MMBtu 2016 10 1057.00563
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW million MMBtu 2016 9 1216.53859
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW million MMBtu 2016 8 1450.03624

Multiply generation values by 1,000,000 and change the units to MMBtu


In [14]:
total_fuel_df.loc[:,'value'] *= 1E6
total_fuel_df.loc[:,'units'] = 'mmbtu'

In [15]:
total_fuel_df['datetime'] = pd.to_datetime(total_fuel_df['year'].astype(str) + '-' + total_fuel_df['month'].astype(str), format='%Y-%m')
total_fuel_df['quarter'] = total_fuel_df['datetime'].dt.quarter
total_fuel_df.rename_axis({'value':'total fuel (mmbtu)'}, axis=1, inplace=True)

In [16]:
#drop
total_fuel_df.head()


Out[16]:
end f geography last_updated sector series_id start type units year month total fuel (mmbtu) datetime quarter
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 12 1.260877e+09 2016-12-01 4
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 11 9.346826e+08 2016-11-01 4
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 10 1.057006e+09 2016-10-01 4
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 9 1.216539e+09 2016-09-01 3
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 8 1.450036e+09 2016-08-01 3

In [17]:
#drop
total_fuel_df.loc[total_fuel_df['type']=='OOG'].head()


Out[17]:
end f geography last_updated sector series_id start type units year month total fuel (mmbtu) datetime quarter

Electric generation fuel consumption


In [18]:
eg_fuel_df = pd.concat([line_to_df(json.loads(row)) for row in eg_fuel_rows])

In [19]:
#drop
eg_fuel_df.head()


Out[19]:
end f geography last_updated sector series_id start type units year month value
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW million MMBtu 2016 12 1236.23483
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW million MMBtu 2016 11 913.35719
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW million MMBtu 2016 10 1036.24820
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW million MMBtu 2016 9 1195.13663
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW million MMBtu 2016 8 1426.01303

Multiply generation values by 1,000,000 and change the units to MMBtu


In [20]:
eg_fuel_df.loc[:,'value'] *= 1E6
eg_fuel_df.loc[:,'units'] = 'mmbtu'

In [21]:
eg_fuel_df['datetime'] = pd.to_datetime(eg_fuel_df['year'].astype(str) + '-' + eg_fuel_df['month'].astype(str), format='%Y-%m')
eg_fuel_df['quarter'] = eg_fuel_df['datetime'].dt.quarter
eg_fuel_df.rename_axis({'value':'elec fuel (mmbtu)'}, axis=1, inplace=True)

In [22]:
#drop
eg_fuel_df.head()


Out[22]:
end f geography last_updated sector series_id start type units year month elec fuel (mmbtu) datetime quarter
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW mmbtu 2016 12 1.236235e+09 2016-12-01 4
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW mmbtu 2016 11 9.133572e+08 2016-11-01 4
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW mmbtu 2016 10 1.036248e+09 2016-10-01 4
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW mmbtu 2016 9 1.195137e+09 2016-09-01 3
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_EG_BTU.COW-US-99.M 200101 COW mmbtu 2016 8 1.426013e+09 2016-08-01 3

Combine three datasets

Need to estimate fuel use for OOG, because EIA doesn't include any (this is only ~2% of OOG fuel for electricity in 2015).


In [23]:
merge_cols = ['type', 'year', 'month']

fuel_df = total_fuel_df.merge(eg_fuel_df[merge_cols+['elec fuel (mmbtu)']], 
                              how='outer', on=merge_cols)
fuel_df.head()


Out[23]:
end f geography last_updated sector series_id start type units year month total fuel (mmbtu) datetime quarter elec fuel (mmbtu)
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 12 1.260877e+09 2016-12-01 4 1.236235e+09
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 11 9.346826e+08 2016-11-01 4 9.133572e+08
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 10 1.057006e+09 2016-10-01 4 1.036248e+09
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 9 1.216539e+09 2016-09-01 3 1.195137e+09
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.CONS_TOT_BTU.COW-US-99.M 200101 COW mmbtu 2016 8 1.450036e+09 2016-08-01 3 1.426013e+09

Not seeing the issue that shows up with facilities, where some facilities have positive total fuel consumption but no elec fuel consumption


In [24]:
#drop
fuel_df.loc[~(fuel_df['elec fuel (mmbtu)']>=0)]


Out[24]:
end f geography last_updated sector series_id start type units year month total fuel (mmbtu) datetime quarter elec fuel (mmbtu)

In [25]:
gen_fuel_df = gen_df.merge(fuel_df[merge_cols+['total fuel (mmbtu)', 'elec fuel (mmbtu)']], 
                           how='outer', on=merge_cols)
gen_fuel_df.head()


Out[25]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter total fuel (mmbtu) elec fuel (mmbtu)
0 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 12 32426660.31 2016-12-01 4 NaN NaN
1 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 11 28579082.98 2016-11-01 4 NaN NaN
2 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 10 29919292.94 2016-10-01 4 NaN NaN
3 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 9 26695985.79 2016-09-01 3 NaN NaN
4 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.AOR-US-99.M 200101 AOR megawatthours 2016 8 24509792.10 2016-08-01 3 NaN NaN

No records with positive fuel use but no generation


In [26]:
#drop
gen_fuel_df.loc[gen_fuel_df['generation (MWh)'].isnull()]


Out[26]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter total fuel (mmbtu) elec fuel (mmbtu)

Add CO2 emissions

The difficulty here is that EIA combines all types of coal fuel consumption together in the bulk download and API. Fortunately the emission factors for different coal types aren't too far off on an energy basis (BIT is 93.3 kg/mmbtu, SUB is 97.2 kg/mmbtu). I'm going to average the BIT and SUB factors rather than trying to do something more complicated. In 2015 BIT represented 45% of coal energy for electricity and SUB represented 48%.

Same issue with petroleum liquids. Using the average of DFO and RFO, which were the two largest share of petroleum liquids.


In [27]:
path = os.path.join('Clean data', 'Final emission factors.csv')
ef = pd.read_csv(path, index_col=0)

In [28]:
#drop
ef.index


Out[28]:
Index([u'BIT', u'DFO', u'GEO', u'JF', u'KER', u'LIG', u'MSW', u'NG', u'PC',
       u'PG', u'RC', u'RFO', u'SGC', u'SGP', u'SUB', u'TDF', u'WC', u'WO',
       u'BFG', u'MSN', u'SC', u'OG', u'AB', u'BLQ', u'LFG', u'MSB', u'NUC',
       u'OBG', u'OBL', u'OBS', u'OTH', u'PUR', u'SLW', u'SUN', u'WAT', u'WDL',
       u'WDS', u'WH', u'WND'],
      dtype='object', name=u'EIA Fuel Code')

In [29]:
#drop
gen_fuel_df['type'].unique()


Out[29]:
array([u'AOR', u'DPV', u'GEO', u'HPS', u'HYC', u'PC', u'COW', u'OOG',
       u'OTH', u'NUC', u'NG', u'WAS', u'STH', u'SPV', u'PEL', u'TSN',
       u'SUN', u'WWW', u'WND'], dtype=object)

Match general types with specific fuel codes

Fuel codes:

  • WWW, wood and wood derived fuels
  • WND, wind
  • STH, solar thermal
  • WAS, other biomass
  • TSN, all solar
  • SUN, utility-scale solar
  • NUC, nuclear
  • NG, natural gas
  • PEL, petroleum liquids
  • SPV, utility-scale solar photovoltaic
  • PC, petroluem coke
  • OTH, other
  • COW, coal,
  • DPV, distributed photovoltaic
  • OOG, other gases
  • HPS, hydro pumped storage
  • HYC, conventional hydroelectric
  • GEO, geothermal
  • AOR, other renewables (total)

In [30]:
#drop
ef.loc['NG', 'Fossil Factor']


Out[30]:
53.07

In [31]:
fuel_factors = {'NG' : ef.loc['NG', 'Fossil Factor'],
                   'PEL': ef.loc[['DFO', 'RFO'], 'Fossil Factor'].mean(),
                   'PC' : ef.loc['PC', 'Fossil Factor'], 
                   'COW' : ef.loc[['BIT', 'SUB'], 'Fossil Factor'].mean(),
                   'OOG' : ef.loc['OG', 'Fossil Factor']}

In [32]:
#drop
fuel_factors


Out[32]:
{'COW': 95.25,
 'NG': 53.07,
 'OOG': 59.0,
 'PC': 102.09999999999999,
 'PEL': 75.975}

In [33]:
# Start with 0 emissions in all rows
# For fuels where we have an emission factor, replace the 0 with the calculated value
gen_fuel_df['all fuel CO2 (kg)'] = 0
gen_fuel_df['elec fuel CO2 (kg)'] = 0
for fuel in fuel_factors.keys():
    gen_fuel_df.loc[gen_fuel_df['type']==fuel,'all fuel CO2 (kg)'] = \
        gen_fuel_df.loc[gen_fuel_df['type']==fuel,'total fuel (mmbtu)'] * fuel_factors[fuel]
        
    gen_fuel_df.loc[gen_fuel_df['type']==fuel,'elec fuel CO2 (kg)'] = \
        gen_fuel_df.loc[gen_fuel_df['type']==fuel,'elec fuel (mmbtu)'] * fuel_factors[fuel]

In [34]:
gen_fuel_df.loc[gen_fuel_df['type']=='COW',:].head()


Out[34]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter total fuel (mmbtu) elec fuel (mmbtu) all fuel CO2 (kg) elec fuel CO2 (kg)
996 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.COW-US-99.M 200101 COW megawatthours 2016 12 1.187902e+08 2016-12-01 4 1.260877e+09 1.236235e+09 1.200985e+11 1.177514e+11
997 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.COW-US-99.M 200101 COW megawatthours 2016 11 8.699971e+07 2016-11-01 4 9.346826e+08 9.133572e+08 8.902852e+10 8.699727e+10
998 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.COW-US-99.M 200101 COW megawatthours 2016 10 9.933756e+07 2016-10-01 4 1.057006e+09 1.036248e+09 1.006798e+11 9.870264e+10
999 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.COW-US-99.M 200101 COW megawatthours 2016 9 1.142817e+08 2016-09-01 3 1.216539e+09 1.195137e+09 1.158753e+11 1.138368e+11
1000 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.COW-US-99.M 200101 COW megawatthours 2016 8 1.358110e+08 2016-08-01 3 1.450036e+09 1.426013e+09 1.381160e+11 1.358277e+11

In [35]:
#drop
gen_fuel_df.loc[gen_fuel_df['type']=='OOG'].head()


Out[35]:
end f geography last_updated sector series_id start type units year month generation (MWh) datetime quarter total fuel (mmbtu) elec fuel (mmbtu) all fuel CO2 (kg) elec fuel CO2 (kg)
1188 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 12 1007405.42 2016-12-01 4 NaN NaN NaN NaN
1189 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 11 1001151.99 2016-11-01 4 NaN NaN NaN NaN
1190 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 10 891165.18 2016-10-01 4 NaN NaN NaN NaN
1191 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 9 1050049.21 2016-09-01 3 NaN NaN NaN NaN
1192 201612 M USA 2017-03-03T09:16:27-05:00 99 ELEC.GEN.OOG-US-99.M 200101 OOG megawatthours 2016 8 1101829.28 2016-08-01 3 NaN NaN NaN NaN

Export data


In [36]:
path = os.path.join('Clean data', 'EIA country-wide gen fuel CO2.csv')
gen_fuel_df.to_csv(path, index=False)

In [ ]:


In [ ]: